{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## (Visual) Chat Completion inference using Online Endpoints\n",
    "\n",
    "This sample shows how to deploy `Phi-3-vision-128k-instruct' to an online endpoint for inference.\n",
    "\n",
    "### Outline\n",
    "* Set up pre-requisites\n",
    "* Pick a model to deploy\n",
    "* Download and prepare data for inference\n",
    "* Deploy the model for real time inference\n",
    "* Test the endpoint\n",
    "* Test the endpoint using Azure OpenAI style payload\n",
    "* Clean up resources"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Set up pre-requisites\n",
    "* Install dependencies\n",
    "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace  `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n",
    "* Connect to `azureml` system registry"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary modules\n",
    "from azure.ai.ml import MLClient\n",
    "from azure.identity import (\n",
    "    DefaultAzureCredential,\n",
    "    InteractiveBrowserCredential,\n",
    ")\n",
    "\n",
    "try:\n",
    "    # Try to get the default Azure credential\n",
    "    credential = DefaultAzureCredential()\n",
    "    credential.get_token(\"https://management.azure.com/.default\")\n",
    "except Exception as ex:\n",
    "    # If default credential is not available, use interactive browser credential\n",
    "    credential = InteractiveBrowserCredential()\n",
    "\n",
    "try:\n",
    "    # Try to create an MLClient using the provided credential\n",
    "    workspace_ml_client = MLClient.from_config(credential)\n",
    "    subscription_id = workspace_ml_client.subscription_id\n",
    "    resource_group = workspace_ml_client.resource_group_name\n",
    "    workspace_name = workspace_ml_client.workspace_name\n",
    "except Exception as ex:\n",
    "    print(ex)\n",
    "    # If MLClient creation fails, enter the details of your AML workspace manually\n",
    "    subscription_id = \"<SUBSCRIPTION_ID>\"\n",
    "    resource_group = \"<RESOURCE_GROUP>\"\n",
    "    workspace_name = \"<WORKSPACE_NAME>\"\n",
    "\n",
    "# Create an MLClient instance with the provided credentials and workspace details\n",
    "workspace_ml_client = MLClient(\n",
    "    credential, subscription_id, resource_group, workspace_name\n",
    ")\n",
    "\n",
    "# The models, fine tuning pipelines, and environments are available in the AzureML system registry, \"azureml\"\n",
    "registry_ml_client = MLClient(credential, registry_name=\"azureml\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Deploy the model to an online endpoint\n",
    "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code checks if the model with the specified name exists in the registry.\n",
    "# If the model exists, it retrieves the first version of the model and prints its details.\n",
    "# If the model does not exist, it prints a message indicating that the model was not found.\n",
    "\n",
    "# model_name: Name of the model to check in the registry\n",
    "model_name = \"Phi-3-vision-128k-instruct\"\n",
    "\n",
    "# Get the list of versions for the specified model name\n",
    "version_list = list(registry_ml_client.models.list(model_name))\n",
    "\n",
    "# Check if any versions of the model exist in the registry\n",
    "if len(version_list) == 0:\n",
    "    print(\"Model not found in registry\")\n",
    "else:\n",
    "    # Get the first version of the model\n",
    "    model_version = version_list[0].version\n",
    "    foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
    "    \n",
    "    # Print the details of the model\n",
    "    print(\n",
    "        \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(\n",
    "            foundation_model.name, foundation_model.version, foundation_model.id\n",
    "        )\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary modules\n",
    "import time\n",
    "from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment\n",
    "\n",
    "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n",
    "timestamp = int(time.time())\n",
    "online_endpoint_name = model_name[:13] + str(timestamp)\n",
    "print(f\"Creating online endpoint with name: {online_endpoint_name}\")\n",
    "\n",
    "# create an online endpoint\n",
    "endpoint = ManagedOnlineEndpoint(\n",
    "    name=online_endpoint_name,\n",
    "    description=f\"Online endpoint for {foundation_model.name}, for visual chat-completion task\",\n",
    "    auth_mode=\"key\",\n",
    ")\n",
    "workspace_ml_client.begin_create_or_update(endpoint).wait()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code creates a deployment for the online endpoint.\n",
    "# It sets the deployment name, endpoint name, model, instance type, instance count, and request settings.\n",
    "# It also sets the liveness probe and readiness probe settings.\n",
    "# Finally, it updates the traffic distribution for the endpoint.\n",
    "\n",
    "\"\"\"\n",
    "from azure.ai.ml.entities import OnlineRequestSettings, ProbeSettings\n",
    "\n",
    "# create a deployment\n",
    "deployment_name = \"phi-3-vision\"\n",
    "demo_deployment = ManagedOnlineDeployment(\n",
    "    name=deployment_name,\n",
    "    endpoint_name=online_endpoint_name,\n",
    "    model=foundation_model.id,\n",
    "    instance_type=\"Standard_NC48ads_A100_v4\",\n",
    "    instance_count=1,\n",
    "    request_settings=OnlineRequestSettings(\n",
    "        request_timeout_ms=180000,\n",
    "        max_queue_wait_ms=500,\n",
    "    ),\n",
    "    liveness_probe=ProbeSettings(\n",
    "        failure_threshold=49,\n",
    "        success_threshold=1,\n",
    "        timeout=299,\n",
    "        period=180,\n",
    "        initial_delay=180,\n",
    "    ),\n",
    "    readiness_probe=ProbeSettings(\n",
    "        failure_threshold=10,\n",
    "        success_threshold=1,\n",
    "        timeout=10,\n",
    "        period=10,\n",
    "        initial_delay=10,\n",
    "    ),\n",
    ")\n",
    "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n",
    "endpoint.traffic = {deployment_name: 100}\n",
    "workspace_ml_client.begin_create_or_update(endpoint).result()\n",
    "\"\"\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Test the endpoint with sample data\n",
    "\n",
    "We will send a sample request to the model, using the json that we create below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary modules\n",
    "import json\n",
    "import os\n",
    "\n",
    "# Define the test JSON payload\n",
    "test_json = {\n",
    "    \"input_data\": {\n",
    "        \"input_string\": [\n",
    "            {\n",
    "                \"role\": \"user\",\n",
    "                \"content\": [\n",
    "                    {\n",
    "                        \"type\": \"image_url\",\n",
    "                        \"image_url\": {\n",
    "                            \"url\": \"https://www.ilankelman.org/stopsigns/australia.jpg\"\n",
    "                        },\n",
    "                    },\n",
    "                    {\n",
    "                        \"type\": \"text\",\n",
    "                        \"text\": \"What is shown in this image? Be extremely detailed and specific.\",\n",
    "                    },\n",
    "                ],\n",
    "            },\n",
    "        ],\n",
    "        \"parameters\": {\"temperature\": 0.7, \"max_new_tokens\": 2048},\n",
    "    }\n",
    "}\n",
    "\n",
    "# Save the JSON object to a file\n",
    "sample_score_file_path = os.path.join(\".\", \"sample_chat_completions_score.json\")\n",
    "with open(sample_score_file_path, \"w\") as f:\n",
    "    json.dump(test_json, f, indent=4)\n",
    "\n",
    "# Print the input payload\n",
    "print(\"Input payload:\\n\")\n",
    "print(test_json)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import necessary modules\n",
    "import pandas as pd\n",
    "\n",
    "# score the sample_chat_completions_score.json file using the online endpoint with the azureml endpoint invoke method\n",
    "response = workspace_ml_client.online_endpoints.invoke(\n",
    "    endpoint_name=online_endpoint_name,\n",
    "    deployment_name=deployment_name,\n",
    "    request_file=sample_score_file_path,\n",
    ")\n",
    "print(\"Raw JSON Response: \\n\", response, \"\\n\")\n",
    "\n",
    "# Parse the JSON string\n",
    "json_data = json.loads(response)\n",
    "\n",
    "# Convert the parsed JSON to a DataFrame\n",
    "response_df = pd.DataFrame([json_data])\n",
    "print(\"Generated Text:\\n\", response_df[\"output\"].iloc[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Test the endpoint using Azure OpenAI style payload\n",
    "\n",
    "We will send a sample request with Azure OpenAI Style payload to the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This code defines a JSON payload for testing the online endpoint with Azure OpenAI style payload.\n",
    "# It includes the model name, a list of messages with user role and content (image URL and text),\n",
    "# temperature, and max_new_tokens.\n",
    "\n",
    "aoai_test_json = {\n",
    "    \"model\": foundation_model.name,\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    \"image_url\": {\n",
    "                        \"url\": \"https://www.ilankelman.org/stopsigns/australia.jpg\"\n",
    "                    },\n",
    "                },\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"What is shown in this image? Be extremely detailed and specific.\",\n",
    "                },\n",
    "            ],\n",
    "        }\n",
    "    ],\n",
    "    \"temperature\": 0.7,\n",
    "    \"max_new_tokens\": 2048,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the scoring uri\n",
    "scoring_uri = workspace_ml_client.online_endpoints.get(\n",
    "    name=online_endpoint_name\n",
    ").scoring_uri\n",
    "# Update the scoring uri to use for AOAI\n",
    "aoai_format_scoring_uri = scoring_uri.replace(\"/score\", \"/v1/chat/completions\")\n",
    "\n",
    "# Get the key for data plane operation\n",
    "data_plane_token = workspace_ml_client.online_endpoints.get_keys(\n",
    "    name=online_endpoint_name\n",
    ").primary_key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib.request\n",
    "import json\n",
    "\n",
    "# Prepare request\n",
    "body = str.encode(json.dumps(aoai_test_json))\n",
    "url = aoai_format_scoring_uri\n",
    "api_key = data_plane_token\n",
    "\n",
    "headers = {\"Content-Type\": \"application/json\", \"Authorization\": (\"Bearer \" + api_key)}\n",
    "req = urllib.request.Request(url, body, headers)\n",
    "\n",
    "# Send request & get response\n",
    "try:\n",
    "    response = urllib.request.urlopen(req)\n",
    "    result = response.read().decode(\"utf-8\")\n",
    "    print(result)\n",
    "except urllib.error.HTTPError as error:\n",
    "    print(\"The request failed with status code: \" + str(error.code))\n",
    "    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure\n",
    "    print(error.info())\n",
    "    print(error.read().decode(\"utf8\", \"ignore\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Delete the online endpoint\n",
    "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Delete Workspace\n",
    "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait(#)"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
